https://scanpy.readthedocs.io/en/stable/tutorials/spatial/basic-analysis.html
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Tools versions summary
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
WARNING: If you miss a compact list, please try `print_header`! The `sinfo` package has changed name and is now called `session_info` to become more discoverable and self-explanatory. The `sinfo` PyPI package will be kept around to avoid breaking old installs and you can downgrade to 0.3.2 if you want to use it without seeing this message. For the latest features and bug fixes, please install `session_info` instead. The usage and defaults also changed slightly, so please review the latest README at https://gitlab.com/joelostblom/session_info. ----- anndata 0.7.8 scanpy 1.8.2 sinfo 0.3.4 ----- PIL 8.4.0 anyio NA asciitree NA attr 21.2.0 babel 2.9.1 backcall 0.2.0 beta_ufunc NA binom_ufunc NA brotli 1.0.9 certifi 2021.10.08 cffi 1.15.0 chardet 4.0.0 charset_normalizer 2.0.8 cloudpickle 2.0.0 colorama 0.4.4 cycler 0.10.0 cython_runtime NA dask 2021.11.2 dateutil 2.8.2 debugpy 1.5.1 decorator 5.1.0 defusedxml 0.7.1 entrypoints 0.3 fasteners NA fsspec 0.7.4 google NA h5py 3.6.0 idna 3.3 igraph 0.9.8 importlib_resources NA ipykernel 6.5.1 ipython_genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 jinja2 3.0.3 joblib 1.1.0 json5 NA jsonschema 4.2.1 jupyter_server 1.12.1 jupyterlab_server 2.8.2 kiwisolver 1.3.2 leidenalg 0.8.8 llvmlite 0.36.0 louvain 0.7.0 markupsafe 2.0.1 matplotlib 3.5.0 matplotlib_inline NA mpl_toolkits NA natsort 8.0.0 nbclassic NA nbformat 5.1.3 nbinom_ufunc NA numba 0.53.1 numcodecs 0.9.1 numexpr 2.7.3 numpy 1.21.4 packaging 21.3 pandas 1.2.5 parso 0.8.2 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prometheus_client NA prompt_toolkit 3.0.23 ptyprocess 0.7.0 pvectorc NA pyarrow 6.0.1 pydev_ipython NA pydevconsole NA pydevd 2.6.0 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.10.0 pyparsing 3.0.6 pyrsistent NA pytz 2021.3 requests 2.26.0 scipy 1.7.3 seaborn 0.11.2 send2trash NA setuptools_scm NA six 1.16.0 sklearn 1.0.1 sniffio 1.2.0 sparse 0.13.0 statsmodels 0.13.1 storemagic NA tables 3.6.1 terminado 0.12.1 texttable 1.6.4 threadpoolctl 3.0.0 tlz 0.11.2 toolz 0.11.2 tornado 6.1 traitlets 5.1.1 typing_extensions NA urllib3 1.26.7 vscode NA wcwidth 0.2.5 websocket 1.2.1 yaml 6.0 zarr 2.10.3 zipp NA zmq 22.3.0 ----- IPython 7.30.0 jupyter_client 7.1.0 jupyter_core 4.9.1 jupyterlab 3.2.4 notebook 6.4.6 ----- Python 3.8.12 (default, Nov 26 2021, 20:28:57) [GCC 10.2.1 20210110] Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.29 8 logical CPU cores ----- Session information updated at 2024-07-21 15:45
#Download data from visium DB and return to anndata object
adata = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
#Set dataset
adata.var_names_make_unique()
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
reading /data/V1_Human_Lymph_Node/filtered_feature_bc_matrix.h5
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
(0:00:05)
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
#Data summary
adata
AnnData object with n_obs × n_vars = 4035 × 36601
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
uns: 'spatial'
obsm: 'spatial'
#Visualize the data
fig, axs = plt.subplots(1, 4, figsize=(15, 4))
#Total counts
sns.histplot(adata.obs["total_counts"], kde=False, ax=axs[0])
sns.histplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=40, ax=axs[1])
#Genecounts
sns.histplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[2])
sns.histplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[3])
<AxesSubplot:xlabel='n_genes_by_counts', ylabel='Count'>
#Filter by total counts
sc.pp.filter_cells(adata, min_counts=5000)
sc.pp.filter_cells(adata, max_counts=35000)
adata = adata[adata.obs["pct_counts_mt"] < 20].copy()
#Filter by gene counts
sc.pp.filter_genes(adata, min_cells=10)
filtered out 44 cells that have less than 5000 counts filtered out 130 cells that have more than 35000 counts filtered out 16916 genes that are detected in less than 10 cells
normalize_total 방법을 통해 데이터를 normalization한다.sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)
normalizing counts per cell
finished (0:00:00)
If you pass `n_top_genes`, all cutoffs are ignored.
extracting highly variable genes
finished (0:00:03)
--> added
'highly_variable', boolean vector (adata.var)
'means', float vector (adata.var)
'dispersions', float vector (adata.var)
'dispersions_norm', float vector (adata.var)
#PCA, calculate neightbor scores, UMAP
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:22)
computing neighbors
using 'X_pca' with n_pcs = 50
2024-07-21 15:46:28.109484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/lib/R/lib 2024-07-21 15:46:28.109566: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:44)
computing UMAP
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:26)
#Clustering
sc.tl.louvain(adata, key_added='louvain_r1')
running Louvain clustering
using the "louvain" package of Traag (2017)
finished: found 10 clusters and added
'louvain_r1', the cluster labels (adata.obs, categorical) (0:00:01)
#Visualize total counts, gene counts, and clustering results together
plt.rcParams["figure.figsize"] = (4, 4)
sc.pl.umap(adata, color=["total_counts", "n_genes_by_counts", "louvain_r1"], wspace=0.4)
... storing 'feature_types' as categorical ... storing 'genome' as categorical
spatial를 이용하여 스캔된 조직 슬라이드 이미지 위에 spatial 정보를 올린다.plt.rcParams["figure.figsize"] = (8, 8)
#Visualize the saptial information following total counts and gene counts
sc.pl.spatial(adata, img_key="hires", color=["total_counts", "n_genes_by_counts"])
#Visualize the saptial information following louvain clustering results
sc.pl.spatial(adata, img_key="hires", color="louvain_r1", size=1.5)
saptial 함수에서는 group 옵션을 통해 특정 cluster만 모아서 볼 수도 있다. alpha 옵션을 조정하면 색의 투명도를 조정해 조직 슬라이드 그림과 겹쳐서 확인할 수 있다.#Visualize spatial information of cluster number 5 and 9
sc.pl.spatial(adata,
img_key="hires",
color="louvain_r1",
groups=["5", "9"],
crop_coord=[7000, 10000, 0, 6000],
alpha=0.5,
size=1.3)
sc.tl.rank_genes_groups(adata, "louvain_r1", method="t-test")
sc.pl.rank_genes_groups_heatmap(adata, groups="9", n_genes=10, groupby="louvain_r1")
ranking genes
finished: added to `.uns['rank_genes_groups']`
'names', sorted np.recarray to be indexed by group ids
'scores', sorted np.recarray to be indexed by group ids
'logfoldchanges', sorted np.recarray to be indexed by group ids
'pvals', sorted np.recarray to be indexed by group ids
'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:10)
WARNING: dendrogram data not found (using key=dendrogram_louvain_r1). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
using 'X_pca' with n_pcs = 50
Storing dendrogram info using `.uns['dendrogram_louvain_r1']`
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 9
#Feature the gene, 'CR2', on the spatial image (right)
sc.pl.spatial(adata, img_key="hires", color=["louvain_r1", "CR2"])
#Feature the gene, 'COL1A2', 'SYPL1', on the spatial image
sc.pl.spatial(adata, img_key="hires", color=["COL1A2", "SYPL1"], alpha=0.7)